Benchmark datasets

We retrieved and curated 82 high-quality datasets from Gemma, focusing on case-control observational studies. These datasets, sourced from GEO, underwent rigorous curation, ensuring reliable gene expression data. Our criteria included a minimum of three samples per condition, no drug treatment, batch effect correction, and at least 15 differentially expressed genes (FDR < 0.2). We processed DNA-microarray data through quantile normalization and log transformation, while RNA-Seq data were obtained as log2-transformed counts per million reads. We mapped probes to Gene IDs, and for genes with multiple probes, we averaged the values. We performed quality checks and analyses, detailed in Supplementary Materials.

GEO Disease/Target pathway Nr. of samples Genome coverage Nr. of DEGs Batch effect
GSE10586 Type I diabetes mellitus 27 (12/15) 18023 25 (18/7) Corrected
GSE41762 Type I diabetes mellitus 77 (20/57) 18094 500 (303/197) Corrected
GSE10715 Colorectal cancer 30 (19/11) 18210 500 (244/256) Corrected
GSE13067 Colorectal cancer 72 (11/61) 18722 500 (279/221) Corrected
GSE31737 Colorectal cancer 79 (40/39) 15188 500 (292/208) Corrected
GSE4107 Colorectal cancer 22 (12/10) 18618 500 (297/203) Corrected
GSE49355 Colorectal cancer 56 (39/17) 11233 500 (192/308) Corrected
GSE50117 Colorectal cancer 18 (9/9) 10100 500 (311/189) Not detected
GSE10810 Breast cancer 58 (31/27) 18543 500 (77/423) Corrected
GSE26304 Breast cancer 115 (109/6) 16638 500 (107/393) Corrected
GSE10927 Renal cell carcinoma 64 (54/10) 18392 500 (94/406) Corrected
GSE15641 Renal cell carcinoma 92 (69/23) 11227 500 (103/397) Corrected
GSE33371 Renal cell carcinoma 64 (54/10) 18392 500 (94/406) Corrected
GSE53757 Renal cell carcinoma 143 (71/72) 18268 500 (173/327) Corrected
GSE11682 Prostate cancer 33 (17/16) 19547 59 (28/31) Corrected
GSE22260 Prostate cancer 30 (20/10) 15837 500 (329/171) Not detected
GSE30521 Prostate cancer 22 (17/5) 15150 500 (292/208) Corrected
GSE12643 Type II diabetes mellitus 20 (10/10) 7513 62 (49/13) Corrected
GSE13760 Type II diabetes mellitus 21 (10/11) 10994 29 (10/19) Corrected
GSE15653 Type II diabetes mellitus 18 (13/5) 11157 500 (303/197) Corrected
GSE20966 Type II diabetes mellitus 20 (10/10) 18819 240 (111/129) Corrected
GSE21340 Type II diabetes mellitus 20 (5/15) 3499 444 (176/268) Corrected
GSE38642 Type II diabetes mellitus 63 (9/54) 18105 500 (189/311) Corrected
GSE40234 Type II diabetes mellitus 62 (34/28) 19098 452 (202/250) Corrected
GSE12685 Alzheimer disease 13 (6/7) 11141 500 (222/278) Corrected
GSE28146 Alzheimer disease 29 (21/8) 17912 500 (171/329) Corrected
GSE36980 Alzheimer disease 79 (32/47) 17837 500 (141/359) Corrected
GSE37263 Alzheimer disease 16 (8/8) 15133 500 (169/331) Corrected
GSE39420 Alzheimer disease 21 (14/7) 17947 500 (162/338) Not detected
GSE4757 Alzheimer disease 20 (10/10) 18801 94 (65/29) Corrected
GSE95587 Alzheimer disease 117 (84/33) 18297 500 (211/289) Not detected
GSE97760 Alzheimer disease 19 (9/10) 10779 500 (397/103) Not detected
GSE14580 Inflammatory bowel disease 14 (8/6) 18612 500 (273/227) Corrected
GSE22619 Inflammatory bowel disease 20 (10/10) 18819 500 (201/299) Corrected
GSE36807 Inflammatory bowel disease 35 (28/7) 18554 500 (193/307) Not detected
GSE14858 Acute myeloid leukemia 39 (20/19) 18337 500 (293/207) Corrected
GSE15605 Melanoma 74 (58/16) 18045 500 (145/355) Not detected
GSE16499 Dilated cardiomyopathy 30 (15/15) 15095 500 (218/282) Corrected
GSE3586 Dilated cardiomyopathy 28 (13/15) 3765 500 (231/269) Not detected
GSE42955 Dilated cardiomyopathy 29 (24/5) 17854 500 (169/331) Corrected
GSE16515 Pancreatic cancer 52 (36/16) 18361 500 (374/126) Corrected
GSE18670 Pancreatic cancer 23 (11/12) 18656 320 (187/133) Corrected
GSE23397 Pancreatic cancer 21 (15/6) 15168 500 (218/282) Corrected
GSE28735 Pancreatic cancer 90 (45/45) 18141 500 (378/122) Not detected
GSE42952 Pancreatic cancer 23 (11/12) 18655 500 (131/369) Not detected
GSE18838 Parkinson disease 28 (17/11) 14894 500 (134/366) Corrected
GSE19587 Parkinson disease 22 (12/10) 11219 500 (164/336) Corrected
GSE20141 Parkinson disease 18 (10/8) 17802 500 (452/48) Corrected
GSE20146 Parkinson disease 19 (10/9) 17870 132 (25/107) Corrected
GSE20163 Parkinson disease 17 (8/9) 11335 500 (177/323) Corrected
GSE20164 Parkinson disease 11 (6/5) 11239 195 (64/131) Corrected
GSE20291 Parkinson disease 35 (15/20) 11341 500 (227/273) Corrected
GSE20292 Parkinson disease 29 (11/18) 11335 500 (232/268) Corrected
GSE20314 Parkinson disease 8 (4/4) 11118 58 (38/20) Corrected
GSE20333 Parkinson disease 12 (6/6) 6756 298 (226/72) Corrected
GSE7621 Parkinson disease 25 (16/9) 18430 500 (204/296) Corrected
GSE90514 Parkinson disease 8 (4/4) 14486 128 (84/44) Not detected
GSE18842 Non-small cell lung cancer 91 (46/45) 18683 500 (205/295) Corrected
GSE19188 Non-small cell lung cancer 156 (91/65) 18484 500 (141/359) Corrected
GSE19804 Non-small cell lung cancer 119 (59/60) 18682 500 (129/371) Corrected
GSE20189 Non-small cell lung cancer 162 (81/81) 10949 500 (188/312) Corrected
GSE21933 Non-small cell lung cancer 42 (21/21) 18941 500 (161/339) Not detected
GSE27262 Non-small cell lung cancer 50 (25/25) 18369 500 (100/400) Corrected
GSE52248 Non-small cell lung cancer 18 (12/6) 15210 500 (153/347) Not detected
GSE19187 Asthma 38 (27/11) 17803 500 (366/134) Corrected
GSE23552 Asthma 39 (26/13) 15210 500 (290/210) Corrected
GSE27011 Asthma 54 (36/18) 17634 500 (273/227) Corrected
GSE28619 Alcoholic liver disease 22 (15/7) 18464 500 (331/169) Not detected
GSE30153 Systemic lupus erythematosus 26 (17/9) 18065 20 (5/15) Corrected
GSE50635 Systemic lupus erythematosus 48 (32/16) 17642 293 (214/79) Corrected
GSE31189 Bladder cancer 92 (52/40) 18649 120 (50/70) Corrected
GSE36389 Endometrial cancer 19 (13/6) 11210 500 (151/349) Corrected
GSE38476 Hepatocellular carcinoma 20 (10/10) 13131 500 (288/212) Not detected
GSE54236 Hepatocellular carcinoma 160 (80/80) 16823 500 (300/200) Corrected
GSE40184 Hepatitis C 18 (10/8) 11099 500 (240/260) Corrected
GSE43754 Chronic myeloid leukemia 19 (9/10) 15020 500 (308/192) Not detected
GSE45516 Huntington disease 9 (6/3) 18331 500 (339/161) Not detected
GSE64810 Huntington disease 69 (20/49) 16046 500 (338/162) Not detected
GSE73655 Huntington disease 20 (13/7) 19881 65 (39/26) Not detected
GSE48850 Thyroid cancer 11 (6/5) 15665 500 (249/251) Not detected
GSE55235 Rheumatoid arthritis 20 (10/10) 11131 500 (320/180) Corrected
GSE5808 Measles 18 (15/3) 11156 500 (135/365) Corrected

Disease Pathway Network

We introduce the Disease Pathway Network to counteract the shortcoming of the “single target pathway” approach and improve the sensitivity assessment in an unbiased way. HumanNet-XC, a comprehensive functional network of human genes, was used to analyze inter-pathway connectivity. Inter-pathway connectivity (IPC) was quantified as the sum of direct links and shared neighbors between genes in two pathways, tested for significance using subsampling. Inter-pathway overlap was assessed using the Jaccard index. Pathway pairs with BH FDR-corrected p-values < 0.05 in both tests were retained.

Pathway name Pathway ID Pathway subclass (Human diseases) 10 20 40 ALL
Acute myeloid leukemia hsa05221 Cancer: specific types 11 21 43 129
Alcoholic liver disease hsa04936 Endocrine and metabolic disease 11 21 41 64
Alzheimer disease hsa05010 Neurodegenerative disease 11 21 41 82
Asthma hsa05310 Immune disease 11 21 41 41
Bladder cancer hsa05219 Cancer: specific types 11 21 41 108
Breast cancer hsa05224 Cancer: specific types 11 21 42 109
Chronic myeloid leukemia hsa05220 Cancer: specific types 11 22 41 129
Colorectal cancer hsa05210 Cancer: specific types 11 21 41 128
Dilated cardiomyopathy hsa05414 Cardiovascular disease 11 21 41 66
Endometrial cancer hsa05213 Cancer: specific types 11 21 42 122
Hepatitis C hsa05160 Infectious disease: viral 12 21 41 128
Hepatocellular carcinoma hsa05225 Cancer: specific types 11 21 41 110
Huntington disease hsa05016 Neurodegenerative disease 11 21 23 23
Inflammatory bowel disease hsa05321 Immune disease 11 22 41 79
Measles hsa05162 Infectious disease: viral 11 21 41 113
Melanoma hsa05218 Cancer: specific types 11 21 41 114
Non-small cell lung cancer hsa05223 Cancer: specific types 12 21 41 129
Pancreatic cancer hsa05212 Cancer: specific types 11 21 41 135
Parkinson disease hsa05012 Neurodegenerative disease 11 21 28 28
Prostate cancer hsa05215 Cancer: specific types 11 22 42 115
Renal cell carcinoma hsa05211 Cancer: specific types 11 21 41 129
Rheumatoid arthritis hsa05323 Immune disease 11 21 41 67
Systemic lupus erythematosus hsa05322 Immune disease 11 21 39 39
Thyroid cancer hsa05216 Cancer: specific types 11 21 41 108
Type I diabetes mellitus hsa04940 Endocrine and metabolic disease 11 21 41 43
Type II diabetes mellitus hsa04930 Endocrine and metabolic disease 11 21 41 125

Nr. of enrichments per dataset

Min, max and median average number of tested pathways in the positive benchmark.

Analysis of biases in the benchmarked methods

We assessed the EA methods’ performance on randomized data. In an ideal scenario, the method should produce a uniform distribution of p-values ranging from 0 to 1 for pathways when applied to randomized data. Ideally, 5% of these p-values would fall below the cutoff of 0.05.

Under the null hypothesis, EA methods often yield p-values that display a bias either toward 0 or 1 or exhibit a bimodal distribution skewed towards both extremes. This bias can significantly influence the significance of the analysis. Therefore, we studied the p-value distributions for each method to determine if they were right-skewed (biased toward 0) or left-skewed (biased toward 1). A right-skewed distribution (p-values biased toward 0) has the potential to produce false positives by identifying pathways as affected when they are not. Conversely, a left-skewed distribution (p-values biased toward 1) may lead to false negatives by indicating pathways as non-significant when they are indeed impacted.

Stats at different number of TPs in the Disease Pathway Network

In this study, we used independent positive and negative benchmarks. The positive benchmark includes true positives (TP) and false negatives (FN), representing pathways correctly identified as significant (p-value < 0.05) and non-significant (p-value ≥ 0.05), respectively. Similarly, the negative benchmark includes true negatives (TN) and false positives (FP), indicating pathways correctly identified as non-significant or significant, respectively. We created the negative benchmark by resampling gene labels on the datasets from the genome, ensuring a consistent number of differentially expressed genes (DEGs) for accurate false positive rate (FPR) estimation across tests. To address the imbalance between positive and negative pathways, we focused on target-related pathways in the negative benchmark to calculate TN and FP.

Using TP, TN, FP, and FN definitions, we derived true positive rate (TPR, or sensitivity) and true negative rate (TNR, or specificity, or 1-FPR). We computed the geometric mean of TPR and TNR (G-mean) as a comprehensive performance summary. Additionally, we assessed the median relative rank of TPs among the top predictions, considering ties by averaging the ranks.

Stats at top 20 of TPs in the Disease Pathway Network

To ensure a balanced representation of the benchmarked disease pathway subnetworks, we limited them to the top 20 linked pathways for each target disease pathway. We made this choice due to the relatively low number of total associations observed in Parkinson’s and Huntington’s diseases.

Runtime

We conducted scalability tests for each method in the benchmark using KEGG as input on 82 datasets. Some methods support parallelization and can handle multiple datasets simultaneously, reducing elapsed time in a battery testing setup. The analysis was conducted on macOS Monterey (v.12.5.1) with an Apple M1 processor (16GB RAM), except for BinoX, which ran on Ubuntu (v.18.04.6) with an Intel Core i7-2600 3.40GHz processor (16GB RAM). GSEA is presented as elapsed time in the results.